Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: N. Balakumar, R. Amsaveni, K. Brindha, S. Sasikala Devi, N. Manoharan, Mani Gopalsamy, L. Sivakami
DOI Link: https://doi.org/10.22214/ijraset.2024.64911
Certificate: View Certificate
Machine learning primarily focuses on supervised learning, which relies on labeled data, while unsupervised learning addresses data without labels. Reinforcement learning, on the other hand, develops agents that learn optimal actions based on past experiences. Transfer learning, distinctively, enhances learning in a target task by leveraging knowledge from related source tasks. In recent years, many machine learning models have remained single-task focused. This paper explores scenarios in which transfer learning can be applied effectively from a source domain to a target domain, employing feature selection, extraction, and construction techniques. These methods enable transformation into a new feature space by generating a refined subset of the original features.
I. INTRODUCTION
Deep learning, also known as deep structured or hierarchical learning, is a subset of machine learning that centers on representation learning as opposed to traditional task-specific algorithms. Deep learning models, which include deep neural networks (DNNs), deep belief networks (DBNs), and recurrent neural networks (RNNs), are pivotal in supervised, semi-supervised, and unsupervised learning approaches. These models have achieved remarkable success across a range of applications, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, and even drug discovery. In many of these fields, deep learning methods have either matched or surpassed human expertise in performance. The predominant architectures in deep learning encompass Deep Boltzmann Machines (DBMs), DBNs, Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and RNNs. While inspired by biological neural systems, deep learning structures diverge considerably from actual neurobiological patterns, limiting direct applicability to neuroscience.
Transfer learning addresses an important limitation in machine learning by focusing on reusing knowledge gained in one task to enhance performance in a related but distinct task. For instance, knowledge of recognizing cars could aid in identifying trucks. Though connected conceptually to the transfer of learning in cognitive psychology, formal cross-disciplinary connections are sparse. The paper is structured as follows: Section II explores deep learning, with sub-sections on supervised and unsupervised learning and their key architectures, including CNNs, ANNs, and RNNs in supervised learning, and DBNs, Autoencoders, Generative Adversarial Networks (GANs), Self-Organizing Maps (SOMs), and DBMs in unsupervised learning. Section III delves into Transfer Learning, detailing its forms—Inductive, Transudative, and Unsupervised Transfer Learning. Section IV discusses Transfer Learning's characteristics, especially how it impacts base and target datasets across varying domains and data sizes. Section V concludes with insights on Transfer Learning's effects and potential future directions.
II. DEEP LEARNING
Introduced to machine learning by Rina Dechter in 1986 and to artificial neural networks by Igor Aizenberg in 2000, deep learning's foundation was laid with multilayer perceptrons for supervised tasks, initially demonstrated by Alexey Ivakhnenko and V.G. Lapa in 1965. Since then, deep learning has evolved, with Kunihiko Fukushima's 1980 Neocognitron being an early model for computer vision. In 1989, Yann LeCun’s application of backpropagation in deep neural networks advanced handwritten ZIP code recognition, though training required extensive time [4].
Transfer learning, first explored by Lorien Pratt in 1993 with the discriminability-based transfer (DBT) algorithm, later evolved in 1997 with multi-task learning theories, which were formalized in Learning to Learn by Pratt and Sebastian Thrun in 1998. The reuse of neural networks, an important application in cognitive science, was also recognized in a 1996 special issue of Connection Science. Our study benefitted from the research carried out on advanced breast cancer detection approaches using deep learning [35-37]
A. Supervised Learning
Supervised learning develops a function that maps inputs to outputs based on labeled training data [8]. The algorithm aims to generalize from examples to predict unseen data accurately. In supervised learning, core architectures include:
???????B. Unsupervised Learning
Unsupervised learning identifies patterns in unlabelled data, often for clustering or dimensionality reduction. Unlike supervised learning, it lacks straightforward accuracy metrics.
III. TRANSFER LEARNING
Transfer learning optimizes training on a new task by utilizing knowledge from previously learned tasks, significantly reducing the computational cost and improving model accuracy and robustness [19, 23]. This is achieved by adapting parameters, features, or entire models learned from a source domain to a target domain, where acquiring large datasets for the target domain may be challenging. Given its ability to enhance generalization in new environments, transfer learning has become central in machine learning research and applications, particularly where data scarcity and time efficiency are crucial constraints.
??????????????A. Inductive Transfer Learning
In inductive transfer learning, the source and target tasks differ, but the model’s underlying knowledge is transferable to help optimize performance on the new task. This approach is useful when labeled data in the target domain is available, enabling the model to better generalize to unseen scenarios in the target task by fine-tuning both general and task-specific features.
???????B. Transductive Transfer Learning
Transductive transfer learning is suitable when only unlabeled target data is available during source training. Here, the model uses domain similarity to improve performance, making this approach relevant in domains where labeling new data is expensive or infeasible. Transductive transfer focuses on learning domain-invariant representations that can generalize well across both source and target domains.
??????????????C. Unsupervised Transfer Learning
In unsupervised transfer learning, neither the source nor the target domain contains labeled data. The model identifies structural similarities between tasks, often for clustering, feature extraction, or anomaly detection. This is particularly useful for feature extraction in domains where labeled data is scarce but unlabeled data is available.
??????????????D. Transfer Learning in Practice
Given the diversity of real-world applications, transfer learning has become an essential tool across fields such as image and speech recognition, predictive maintenance, robotics, and beyond:
??????????????E. Technical Advancements Driving Transfer Learning
Advancements in deep learning frameworks and methodologies have enhanced transfer learning by optimizing both the efficiency and effectiveness of knowledge transfer:
???????F. Implications and Future Directions in Transfer Learning
Transfer learning is increasingly recognized for its capacity to bridge the gap between data-rich and data-scarce environments, addressing real-world challenges by reducing dependency on large labeled datasets. Its applications in low-resource settings, like healthcare and sustainability, demonstrate its potential for impactful contributions to society. Future research is exploring meta-transfer learning, where models are trained to quickly adapt to new tasks with minimal data, and few-shot learning, which enables effective learning with a few samples, further broadening the scope and feasibility of transfer learning across diverse applications.
IV. TRANSFER LEARNING CHARACTERISTICS
Transfer learning aims to repurpose a pre-trained model on a new but related task. Optimally selecting features and relevant variables is essential for improving model efficiency and performance. Feature selection algorithms, including filter, wrapper, and embedded methods, are commonly employed to refine data in both base and target datasets. Feature selection improves model performance by reducing overfitting, cutting down training time, and enhancing accuracy.
A. Datasets.
In transfer learning, a dataset is considered similar if the source and target domains align closely in structure or content. For example, the Places 205 Database ([Places 205] (http://places.csail.mit.edu/)) includes 2.5 million images across 205 scene categories, making it ideal as a base dataset for scene classification tasks. Transfer learning scenarios involving the Places 205 Database often use related target datasets, such as CS-280 Mini Places or Places365, which contain subsets that retain structural similarity. A contrasting example is ImageNet, which serves as a base dataset for various small target datasets, such as ImageNet8x8, ImageNet16x16, ImageNet32x32, and ImageNet64x64, each maintaining the same number of training images but differing in resolution. Smaller datasets like CIFAR-10 , CIFAR-100 , and STL-10 are also popular for transfer learning due to their manageable sizes (10–100 classes). Generally, when labeled data is scarce, feature engineering and transfer techniques become invaluable for model generalization.
B. Data Diversity
Data diversity plays a crucial role in determining if a dataset is suitable for transfer learning. For instance, datasets within a single industry (e.g., healthcare) might have high domain-specific alignment, but they may also have subtle links to other sectors, such as transportation (logistics of medical supplies) and banking (financial transactions in healthcare). Cross-domain data fusion, such as utilizing information from both healthcare and manufacturing sectors, could provide new perspectives for model training.
C. Dataset Size
In transfer learning, dataset size significantly impacts model training strategies. A "small" dataset typically has less than 25–30% of the classes compared to the base dataset. For instance, if using the Places 205 as a base, any dataset with fewer than 50 scene classes would be considered small. When the sample size per class exceeds 10,000 images, the dataset is considered large. Additionally, training datasets usually need to comprise at least 70% of the base dataset, ideally containing over 100 classes for effective knowledge transfer
D. Parameter Sharing
Parameter sharing allows the reuse of model parameters when processing target datasets of varying spatial dimensions. In transfer learning, convolutional and pooling layers can operate on inputs of different sizes without requiring extensive retraining, allowing the pretrained model to adapt to different spatial dimensions effectively.
E. Ensemble Methods
When there are distributional differences between training and test datasets, ensemble transfer learning techniques can enhance model performance. Ensemble methods integrate multiple models, thus improving classification accuracy by mitigating the effect of data insufficiencies, a common challenge in transfer learning scenarios with limited target data.
G. Data Epoch and Batch Processing
To handle large datasets, transfer learning models often divide data into batches, enabling sequential processing and weight updates. Batch sizes, iterations, and epochs control how many times the model adjusts based on data segments, allowing models to generalize well across large target datasets even when computational resources are limited.
H. Dataset Mapping
The effectiveness of transfer learning depends on the size and similarity of the target dataset to the base dataset. Transfer learning best practices include six common scenarios:
Expanding on each transfer learning scenario will help clarify the technical steps for implementing them based on dataset characteristics, such as similarity to the base dataset and size:
1) Scenario 1: Small, Similar Target Dataset to the Base Training Dataset
When working with a small and similar target dataset to the base dataset, the approach focuses on maximizing data efficiency without overfitting, given the limited amount of target data. A common strategy is **feature extraction**, where we leverage the pre-trained model’s layers—especially the early and intermediate layers that contain generalized features learned from the base dataset. The process typically includes the following steps:
2) Scenario 2: Large, Similar Target Dataset to the Base Training Dataset
When the target dataset is large and similar to the base dataset, the focus is on optimizing for computational efficiency while taking advantage of the robust feature similarity. Here, the approach may involve deeper layers, as the larger dataset allows more extensive fine-tuning without overfitting.
3) Scenario 3: Small, Different Target Dataset from the Base Training Dataset
A small target dataset that differs significantly from the base dataset poses a unique challenge. Here, the pre-trained model provides a starting point, but substantial fine-tuning or modification is required to bridge the domain gap.
4) Scenario 4: Large, Different Target Dataset from the Base Training Dataset
A large and different target dataset allows flexibility for model adjustment and fine-tuning. This scenario benefits from the extensive capacity of the pre-trained model but requires more customization to accommodate the new domain.
5) Scenario 5: Initializing with a pre-trained Network Instead of Random Initialization
Initializing a model with pre-trained weights rather than random initialization is foundational to transfer learning, leveraging existing knowledge from a model trained on a large base dataset. This approach accelerates convergence and enhances accuracy, particularly when the target task shares underlying features with the base task.
6) Scenario 6: Freezing Weights of All but the Final Layer, Fine-tuning Only the Last Layer
Freezing weights for all layers, except the last fully connected layer, is a straightforward transfer learning approach suited for scenarios where the target dataset is small or differs minimally in structure.
In deep learning, pre-trained models are often initialized with general features in the first layer, such as Gabor filters and color blobs, which are applicable across various datasets and domains. These general features serve as a foundational layer for transfer learning, allowing target domain-specific features to be trained more efficiently. For example, a model pre-trained on a Places dataset can use its learned edge and texture features for similar datasets, expediting the training process while reducing resource requirements. By leveraging pre-trained models, the time and computational resources needed to extract general features are significantly reduced, streamlining model development
In this paper, we observed that building a machine-learning model from scratch is often resource-intensive, demanding significant time and computational power, especially in environments with limited hardware or software resources. Transfer learning offers a valuable solution by significantly improving model performance, accuracy, and learning efficiency, as it leverages knowledge from pre-trained models rather than starting anew. By accelerating training time, transfer learning enables further exploration into optimizing learning rates and enhancing model robustness. It also provides measurable accuracy gains over baseline performance, with established models like ResNet50 [31], VGG16 [32], VGG19 [33], and InceptionV3 [34] demonstrating reliable improvements in diverse applications.
[1] Rina Dechter (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. [2] Ivakhnenko, Alexey (1971). \"Polynomial theory of complex systems\". IEEE Transactions on Systems, Man and Cybernetics. 1(4): 364–378. [3] Fukushima, K. (1980). \"Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position\". Biol. Cybern. 36 (4): 193–202. [4] LeCun et al., \"Backpropagation Applied to Handwritten Zip Code Recognition,\" Neural Computation, 1, pp. 541–551, 1989. [5] Pratt, L. Y. (1993). \"Discriminability-based transfer between neural networks\" (PDF). NIPS Conference: Advances in Neural Information Processing Systems 5. Morgan Kaufmann Publishers. pp. 204–211 [6] Baxter, J., \"Theoretical Models of Learning to Learn\", pp. 71-95 Pratt & Thrun 1998 [7] Pratt, L. (1996). \"Special Issue: Reuse of Neural Networks through Transfer\". Connection Science. Retrieved 2017-08-10. [8] Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012) Foundations of Machine Learning, The MIT Press ISBN 9780262018258 [9] Jordan, Michael I.; Bishop, Christopher M. (2004). \"Neural Networks\". In Allen B. Tucker. Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, Florida: Chapman & Hall/CRC Press LLC. ISBN 1-58488- 360-X. [10] Claude Sammut, and Geoffrey I. “Encyclopedia of Machine Learning (pp.159-162)” [11] Yoshua Bengio, Ian J. Goodfellow, Aaron Courville (2015) “Deep Learning (pp.183-200)” [12] Marcel van Gerven, Sander Bohte \"Artificial Neural Networks as Models of Neural Information Processing | Frontiers Research Topic\" [13] Indra Den Bakker “Python Deep Learning Cookbook, Packt Publishing (pp.173-189)” ISBN 978-1-78712-519-3 [14] Yoshua Bengio, Ian J. Goodfellow, Aaron Courville (2015) “Deep Learning (pp.382-384)” [15] Giancarlo Zaccone, Md. Rezaul Karim, Ahmed Menshawy “Deep Learning with TensorFlow, (pp.98)” ISBN 978-1-78646- 978-6 [16] Claude Sammut, and Geoffrey I. “Encyclopedia of Machine Learning (pp.99)” [17] Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Joshua (2014). \"Generative Adversarial Networks\". [18] Kohonen, Teuvo; Honkela, Timo (2007). \"Kohonen Network\". Scholarpedia (http://www.scholarpedia.org/article/Kohonen_network) [19] Emilio Soria Olivas, José David Martín Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, Antonio José Serrano López, “Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques (pp.242-264)” Information Science Reference. ISBN 978-1-60566-767-6 [20] Emilio Soria Olivas, José David Martín Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, Antonio José Serrano López, “Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques (pp.245-246)” Information Science Reference. ISBN 978-1-60566-767-6 [21] A. Arnold, R. Nallapati, and W. W. Cohen, “A comparative study of methods for transductive transfer learning,” in Proceedings of the 7thIEEE International Conference on Data Mining Workshops. Washington, DC, USA: IEEE Computer Society, 2007, pp. 77–82. [22] Sinno Jialin Pan, Qiang Yang, “A Survey of Transfer Learning https://ieeexplore.ieee.org/document/5288526/” [23] Lisa Torrey and Jude Shavlik “Transfer Learning”, University of Wisconsin, Madison, WI, USA. [24] Place 205 Dataset - http://places.csail.mit.edu/user/download.php [25] Miniplace - https://www.kaggle.com/c/cs280-mini-places/rules [26] Places365 - http://places2.csail.mit.edu/download.html [27] Image Net - http://image-net.org/index [28] CIFER-10 Dataset - https://www.cs.toronto.edu/~kriz/cifar.html [29] CIFER-100 Dataset - https://www.cs.toronto.edu/~kriz/cifar.html [30] STL-10 Dataset - https://cs.stanford.edu/~acoates/stl10/ [31] ResNet50 - https://www.kaggle.com/dansbecker/transfer-learning/data [32] VGG16 Model in Kaggle - https://www.kaggle.com/keras/vgg16 [33] VGG19 Model in Kaggle - https://www.kaggle.com/keras/vgg19 [34] InceptionV3 Model in Kaggle - https://www.kaggle.com/google-brain/inception-v3 [35] Souza, M.D., Prabhu, G.A., Kumara, V. et al. EarlyNet: a novel transfer learning approach with VGG11 and EfficientNet for early-stage breast cancer detection. Int J Syst Assur Eng Manag (2024). https://doi.org/10.1007/s13198-024-02408-6 [36] Melwin D\'souza, Ananth Prabhu Gurpur, Varuna Kumara, “SANAS-Net: spatial attention neural architecture search for breast cancer detection”, IAES International Journal of Artificial Intelligence (IJ-AI), Vol. 13, No. 3, September 2024, pp. 3339-3349, ISSN: 2252-8938, DOI: http://doi.org/10.11591/ijai.v13.i3.pp3339-3349 [37] Melwin D Souza, Ananth Prabhu G and Varuna Kumara, A Comprehensive Review on Advances in Deep Learning and Machine Learning for Early Breast Cancer Detection, International Journal of Advanced Research in Engineering and Technology (IJARET), 10 (5), 2019, pp 350-359.
Copyright © 2024 N. Balakumar, R. Amsaveni, K. Brindha, S. Sasikala Devi, N. Manoharan, Mani Gopalsamy, L. Sivakami. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET64911
Publish Date : 2024-10-30
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here